Group 20
Aliya Mosha (s2206786)
Saioa Galvin (s2516907)
Hugh Graham (s2318382)
In this statistical report, we investigate the techniques used by the forensic services at Police Scotland on blood alcohol computations in criminal cases associated with drink driving. In UK law, it is an offence to drive with blood alcohol concentration (BAC) above the prescribed limit. Since alcohol levels fall over time, the measured BAC at the police station is lower than at the time of driving; forensic scientists ‘back-calculate’ the earlier level using an elimination rate (Forensic Science Regulator 2020).
The first method implemented by Police Scotland is the so-called \(\beta\) elimination rate, which measures how quickly alcohol is eliminated from the bloodstream. Using available data, a distribution for \(\beta\) can be constructed, and the 2.5th and 97.5th percentiles of this distribution are used as the limits for this range. Currently, if an individual has a blood alcohol concentration above the legal limit, calculated using the lower 2.5th percentile of the \(\beta\) distribution, the forensic team concludes that they are above the limit. We assessed whether this approach fairly represents uncertainty and whether a clearer reporting standard could improve transparency.
Using the provided dataset, we examined how \(\beta\) varies based on individual characteristics (weight, height, age, sex) and compared several modelling strategies: using empirical percentiles, parametric fitting, sex-specific shifts, linear models, and quantile regression for the 2.5% of \(\beta\).
In practice, we found that \(\beta\) shows significant variation across individuals, suggesting treating \(\beta\) as a single fixed range for everyone ignores these variabilities. Further, estimates are better represented with uncertainty rather than using a single cutoff, as is currently implemented. These findings support the idea that estimates should be coupled with uncertainty, and that a probability of exceeding the legal limit, rather than a binary verdict seems a more fitting approach.
We also considered situations where no blood sample is available and Widmark’s equation must be used along with eyewitness dose, which introduces the volume of distribution \(V_d\). The standard approach treats \(\beta\) and \(V_d\) as independent and applies fixed percentile ranges for both.
We found that \(V_d\) and \(\beta\) are negatively correlated in the data. The common practice of combining extreme marginal percentiles for \(\beta\) and \(V_d\) while assuming independence can therefore produce implausible joint scenarios.
Based on our findings, we encourage expert reports used in jurisdiction cases to include:
the measured BAC
the elapsed time since driving
the estimated BAC at time of driving (\(C_0\))
a 95% uncertainty interval for \(C_0\) derived from the fitted \(\beta\) distribution
the probability that \(C_0\) exceeded the legal limit.
Further, we recommend that when Widmark’s equation is used, reports should specify the assumed distributions for \(\beta\) and \(V_d\), along with acknowledgement of their relationship.
The \(\beta\) elimination rate (measured in g/kg/hr) measures how quickly alcohol is eliminated from the bloodstream, and is used by forensic scientists at Police Scotland when ‘back-calculating’ the BAC at the time of offence (Forensic Science Regulator 2020). This calculation is of form
\[C_0 = C_t +\beta t,\]
where \(C_t\) is the measured BAC at time \(t\) hours after the alleged offence, and \(C_0\) is the estimated BAC at the time of driving.
Currently, forensic scientists use a single, fixed range of \(\beta\) values for all individuals. This approach simplifies practical implementation but does not account for variability due to characteristics of the individual. In particular, factors such as age, gender, weight, or height could influence the rate at which alcohol is eliminated from the bloodstream.
Hence, it is worth exploring whether \(\beta\) can be more effectively modelled to reflect the variability in individual’s characteristics, rather than relying on a fixed population range.
The dataset analysed in this study was taken from a PhD thesis, and contains measurements associated with BAC (Armbrecht 2008). In particular, the dataset contains 100 observations with characteristic information about the individual, and details about the dose of alcohol consumed by the individual.
Key variables that are explored in this report include:
Beta60: the negative of the \(\beta\) elimination rate (g/kg/hr), though
for our analysis we took the corresponding positive variable
beta;
C0: the concentration of alcohol in the blood at
time zero;
weight: the weight (in kg) of a given
individual;
height: the height (in cm) of a given
individual;
age: the age (in years) of a given
individual;
sex: the sex of a given individual;
Amount of alcohol consumed: the dose (in g) of
alcohol consumed.
In order to understand and evaluate the performance of the \(\beta\) elimination rate, we first examine various plots to visualise how the rate is influenced by certain characteristics. The characteristics of interest are weight, height, age, and sex.
These are each plotted against the \(\beta\) elimination rate. Intuitively a larger \(\beta\) value corresponds to faster elimination of alcohol from the blood stream. Figures 1-4 below explore these relationships.
From Figures 1-4, we can deduce the following:
as weight increases the elimination rate decreases linearly;
similarly, as height increases the elimination rate decreases linearly;
the majority of age data is scattered between 20-30, however there is a reasonably constant spread of values, suggesting age is less influential on alcohol elimination;
there is a clear difference in elimination rate between genders, with males typically obtaining higher values.
To further enhance our investigation into the effect each of these characteristics has on the elimination rate, the following table evaluates the significance of correlation between each characteristic and \(\beta\).
| Characteristic | Correlation | P_value |
|---|---|---|
| age | 0.041 | 0.68772 |
| height | -0.410 | 0.00002 |
| weight | -0.356 | 0.00027 |
From this output, we can see that weight and height have significant correlations with the \(\beta\) elimination rate, with respective correlations of -0.356 and -0.410. This suggests that these characteristics play a crucial role in determining the rate at which alcohol is eliminated from the blood stream. Further, the correlation test returned a p-value below 0.05 for both of these characteristics, indicating that there is evidence to reject the null hypothesis that there is no correlation between these characteristics and \(\beta\). As expected, there is a low correlation between age and elimination rate, and a large p-value of 0.68772 further validates this. Hence, we can deduce that age plays little role in the elimination rate of alcohol from the blood stream.
Due to the clear relationships between \(\beta\) and characteristics of the tested individuals, it is worth considering approaches in which these factors are incorporated to accurately predict the rate at which alcohol is eliminated from the blood stream.
An issue with creating a distribution from the \(\beta\) elimination rate values given is that we only have 100 data points. Therefore the distribution is very dependent on not many values. We propose instead investigating if \(\beta\) follows a standard distribution. First, we will examined the general spread of \(\beta\) values given in the following histogram, where the mean, 2.5th, and 97.5th quantiles are indicated in red.
We have seen from exploratory data analysis that the rate at which alcohol density in the blood decreases is dependent on a range of variables given in the data. As Sex is the only discrete variable, we can split the data in two to see how \(\beta\), the alcohol elimination rate, is distributed for males and females in the dataset given.
These visualisations can point us towards a few different distributions. We can analyse how well the data fits the distributions in the following plots.
The aim of finding a distribution to fit \(\beta\) is to smooth out the distribution to decrease the influence of a few outliers in the original dataset, as there are only 100 values. That means that a single datapoint can skew the \(2.5%\) quantile that we are aiming to find. We are also focused on simplicity, so have only looked at 3 different distributions that have the general shape of the data, shown in the figures in the section above. Our method selection also relies on simplicity, as we do not want to over-fit our chosen distribution on such a small dataset. To evaluate how well our data fits our chosen distributions, we have analysed the residual (\(\epsilon_i\)) graphs below.
Empirical and Theoretical Densities: The theoretical
peak value is skewed compared to the empirical peak, and has lower
density.
Q-Q plot: Lack-of-fit observed at distribution
tails. The left tail is heavy while the right tail is light.
Empirical and Theoretical CDFs: Poor fit at central
values of the distribution.
P-P plot: Lack-of-fit at the distribution centre, as
the central values deviate from the linear relationship.
Empirical and Theoretical Densities: Improved fit of
peak is observed, while still not completely aligned.
Q-Q plot: Points are randomly distributed above and
below the general linear trend, with outliers at the lower
tail.
Empirical and Theoretical CDFs: Improved fit at
distribution centre compared to normal.
P-P plot: Improved fit at distribution centre
compared to normal.
Empirical and Theoretical Densities: Best fit of
peak of the analysed distributions.
Q-Q plot: Points are randomly distributed above and
below the general linear trend across all quantiles.
Empirical and Theoretical CDFs: Best fit of
theoretical distribution to data.
P-P plot: Majority of points lie on fitted
line.
After looking at the residuals, we conclude that Gamma is the most appropriate distribution for our \(\beta\) with the fitting method.
The 2.5% quantile in the gamma distribution of \(\beta\) is 0.1257847. The Density graph of the gamma distribution across the whole population of Gamma is shown below to demonstrate fit, again with the mean, 2.5th, and 97.5th quantiles indicated in red.
Considering that Sex is a factor variable, we can find the distributions of beta_60 dependent on whether the suspect is male of female.
The 2.5% quantile in the gamma distribution for males is 0.1219521, and for females it is 0.1430605.
Instead of fitting a distribution, we can create a linear model. This creates a distribution for \(\beta\) given a range of information about the suspect, whether this information is in the form of discrete or continuous form. This is more useful than the method that finds a different beta distribution for males and females, as it is not possible to do the same for continuous information such as weight. Therefore the previous methods don’t make the most of all the information given.
We can create the following model, where \(\beta_j\) represent the coefficients for \(j \in \{0, ..., 4\}\) to model \(Y_i\), the \(i\) datapoints of \(\beta\):
\[ \begin{aligned} \text{linear model}: \quad Y_i &= \beta_0 + \beta_1\text{weight}_i + \beta_2\text{age}_i + \beta_3\text{height}_i + \beta_4\text{sex}_i + \epsilon_i, \\ \epsilon_i &\overset{\mathrm{iid}}\sim \mathcal{N}(0, \sigma^2). \end{aligned} \]
Residual vs Fitted: There is a change of variance
with mean, so the constant variance assumption is slightly violated.
Unfortunately there is little data points to establish if this is an
important violation.
Q-Q Residuals: The ordered standardised residuals
are plotted against quantiles of a standard normal. Lack-of-fit observed
at distribution tails. The left tail is heavy while the right tail is
light.
Scale-Location: We have random variation around the
mean, so there is no violation of the constant variation
assumption.
Residuals vs Leverage: This plot gives us Cook’s
distance, which measures the change in all model fitted values on
omission of the data point in question. From the plot we can see there
are no outliers, so there aren’t any highly influential points in our
data.
To try to resolve these violations of the assumptions on residuals,
we can create an alternative linear model. Given that there is a
quadratic shape in the Residuals vs Fitted graph, we create
the following linear model.
\[ \begin{aligned} \text{square root linear model}: \quad \sqrt{Y_i} &= \beta_0 + \beta_1\text{weight}_i + \beta_2\text{age}_i + \beta_3\text{height}_i + \beta_4\text{sex}_i + \epsilon_i, \\ \epsilon_i &\overset{\mathrm{iid}}\sim \mathcal{N}(0, \sigma^2). \end{aligned} \]
We get the following residual graphs.
The Q-Q Residuals plot is improved, however there is
only marginal change in the Residuals vs Fitted plot.
Therefore we explore alternative methods.
The biggest issue with the linear model above for use in our project is that it’s aim is to find the conditional mean function, \(E(\beta | x)\), where \(x\) represents the explanatory variables of the model. The aim of our analysis is to find the 2.5% quantile of \(\beta\). It is more accurate to use quantile regression (Rodriguez and Yao 2017). We use the same linear model as above.
Quantile of whole distribution of y|x, instead of quantile of expectation of y|x.
While it may look like taking the 2.5% quantile of the linear model, and using the quantile regression model produce very similar results, the methods are different. To find the 2.5% quantile of the linear model, we find the 2.5% quantile of the residuals, and add that onto \(E(\beta | x)\) predicted by our model. This assumes homoscedaticity, that the residuals are independent of predictors, and we have shifted the line uniformly, without taking into account how residual quantile could change across the data (Rodriguez and Yao 2017). Our quantile regression method does not have a homoscedaticity assumption, which is more reasonable, especially as the residual plots of our linear model are questionable in upholding the homoscedaticity assumption.
The current forensic approach uses a single threshold for the elimination rate, specifically the 2.5th percentile of the empirical or fitted distribution of \(\beta\). This value is substituted into
\[ C_0 = C_t + \beta t \] to obtain a point estimate of the blood alcohol concentration at the time of driving. A binary decision is then made as to whether this estimated \(C_0\) exceeds the legal limit. While this approach is simple, it disregards uncertainty in \(\beta\) and in the model used to derive it. Consequently, it may give a misleading impression of precision in borderline cases.
To provide a more transparent and scientifically defensible analysis, we propose that Police Scotland adopt a probabilistic reporting framework, which explicitly quantifies the uncertainty in \(\beta\) and its effect on the estimated blood alcohol concentration \(C_0\).
Given that we have found \(\beta\) to follow a Gamma distribution,
\[ \beta \sim Gamma(\hat{\alpha}, \hat{\lambda}) \]
where \(\hat{\alpha} = 31.95\) and \(\hat{\lambda} = 173.71\), we can propagate this uncertainty to obtain the distribution of \(C_0 = C_t + \beta t\).
A simple Monte Carlo procedure can be used to estimate the probability that the initial true blood alcohol concentration was above the legal limit \(x = 0.47 g/kg\):
\[ \begin{eqnarray} \mathbb{P}(C_0 > x) &=& \mathbb{P}(C_t + \beta t > x) \\ &=& \mathbb{P}\left(\beta > \frac{(x - C_t)}{t}\right) \end{eqnarray} \]
This could also be computed directly from the fitted Gamma distribution:
\[ \mathbb{P}(C_0 > x) = 1 - F_{Gamma}\left(\frac{(x - C_t)}{t} ; \hat{\alpha}, \hat{\lambda}\right) \]
where \(F_{Gamma}\) is the cumulative distribution function of the Gamma distribution.
This probability offers a more nuances summary than a simple ‘above/below limit’ approach. For example, if \(\mathbb{P}(C_0 > x) = 0.78\), the expert witness could state that there is a 78% probability that the suspect’s blood alcohol concentration exceeded the legal limit at the time of driving.
Alongside the probability, a prediction interval for \(C_0\) could be presented:
\[ C_0 \in [C_t + q_{0.025}(\beta) t, C_t + q_{0.975}(\beta) t] \]
This provides a range of plausible values for \(C_0\) given the uncertainty in \(\beta\). For example, using \(C_t = 0.15 g/kg\) and \(t = 2\) hours, a 95% prediction/credible interval for the blood alcohol concentration at the time of driving, \(C_0\), would be (0.4, 0.66) g/kg.
| Statistic | Value |
|---|---|
| Interval length | 0.254 |
| Lower 95% bound for C₀ | 0.402 |
| Upper 95% bound for C₀ | 0.656 |
| Analytical P(C₀ > 0.47 g/kg) | 0.761 |
| Simulated P(C₀ > 0.47 g/kg) | 0.748 |
| Mean of simulated C₀ | 0.514 |
| SD of simulated C₀ | 0.064 |
Quantile regression produces a 2.5% and 97.5% quantile for each possible combination of explanatory variables, so we have to interpret a confidence interval of possible elimination rates differently here. To give an indication of the range of beta under quantile regression, we can look at the mean at each quantile.
| Statistic | Value |
|---|---|
| Interval Length | 0.220 |
| Lower 95% bound for C₀ | 0.415 |
| Upper 95% bound for C₀ | 0.635 |
| Simulated P(C₀ > 0.47 g/kg) | 0.753 |
| Mean of simulated C₀ | 0.525 |
| SD of simulated C₀ | 0.063 |
For the 70-year-old female, given \(C_t = 0.15 g/kg\) measured 2 hours after arrest and the fitted Gamma model for the elimination rate, the estimated 95% prediction/credible interval for the blood alcohol concentration at the time of driving is \(C_0 \in (0.40,0.66)g/kg\). The probability that the true \(C_0\) exceeded the legal limit of 0.47 g/kg is approximately 76%.
Under the current forensic convention (using \(\beta = 0.126\)), the individual would be reported as below the limit. However, our uncertainty-based model reveals a substantial probability that her true level was above the limit, suggesting that a probabilistic or interval-based report would convey the evidence more transparently.
This approach visually presents the relationship between the elimination rate, \(\beta\), and the estimated blood alcohol concentration at the time of driving, \(C_0 = C_t + \beta t\).
The figure combines several elements:
This graphical presentation effectively communicates two key ideas:
The uncertainty surrounding the elimination rate \(\beta\), as represented by its fitted probability distribution; and
The degree of confidence that the individual’s true \(C_0\) was above or below the legal threshold.
All \(\beta\) values to the right of the point \((\beta_{\text{limit}}, C_{\text{legal}})\) represent plausible elimination rates which, if true, would imply that the individual was over the legal limit. The proportion of the red shaded area under the curve quantifies this probability directly - providing an intuitive visual link between statistical uncertainty and the forensic question of interest.
To compare the performance of our different approaches outlined above we consider the following case study.
A 70 year old female (weight: 70kg, height: 160cm) is arrested after being stopped by the police while driving. She provides a blood sample to the police 2 hours after her arrest which gives a reading of Ct = 0.15g/kg (i.e. grams of alcohol per kilogram of blood). The legal limit is x = 0.47g/kg. The police wish to charge her with the criminal offence of driving while over the legal limit.
We can find the \(2.5%\) quantile of the different beta distributions. Why?
| Approach | Lower (2.5%) | Central | Upper (97.5%) |
|---|---|---|---|
| Population (empirical quantiles) | 0.403 | 0.509 | 0.654 |
| Gamma Fitted | 0.402 | 0.514 | 0.656 |
| Male Gamma Fitted | 0.394 | 0.495 | 0.620 |
| Female Gamma Fitted | 0.436 | 0.547 | 0.684 |
| Linear Regression | 0.436 | 0.560 | 0.684 |
| Quantile Regression | 0.343 | 0.429 | 0.514 |
The legal limit is 0.47g/kg.
If it is too late for a blood or breath test to predict accurate results for \(C_0\), but there is eyewitness evidence measuring the quantity of alcohol consumed, the Widmark’s equation can help to calculate the blood alcohol concentration \(t\) hours after drinking (\(C_t\)). The validity of this estimate is dependent on the reliability of the eyewitness’ testimony, however this analysis can be combined with other evidence to support a case. Widmark’s equation can be written as
\[ C_t = \frac{A}{\text{Weight} \times V_d} - \beta t, \]
where \(A\) is the observed does of alcohol consumed, \(\text{Weight}\) is the weight of the individual, and \(V_d\) is a parameter known as the volume of distribution, “a proportionality constant that relates the total amount of drug (in our case alcohol) in the body to the plasma concentration of the drug at a given time.” (Mansoor and Mahabadi 2023) \(V_d\) is measured with the units L/kg. Rearranging this equation we find
\[ C_0 = C_t +\beta_t = \frac{A}{\text{Weight} \times V_d}, \]
which in turn implies
\[ V_d = \frac{A}{\text{Weight} \times C_0}. \]
Hence, given we are provided the data for \(A\), \(\text{Weight}\), and \(C_0\), we can calculate \(V_d\) using the above equation.
Similar to our approach in analysing the effectiveness of the \(\beta\) elimination rate, we can investigate the relationships between \(V_d\) and characteristics of individuals (weight, height, age, and sex). It is also worth investigating the distribution of \(V_d\) values across our sample.
Figures 5-9 help us visualise these relationships.
From Figure 5, we can see that the majority of \(V_d\) values lie around 0.65, in a roughly normally distributed pattern, with the mean, 2.5th, and 97.5th quantiles highlighted in red. There appears to be one particular outlier, with a \(V_d\) value of around 1.2, which may be worth investigating further.
Figures 6-9 highlight the following relationships:
\(V_d\) shows a very slight negative trend as weight increases;
\(V_d\) appears constant across the range of heights;
\(V_d\) appears to decrease with age, although again the majority of the data lies between ages of 20-30, so general conclusions are hard to draw from the sample;
\(V_d\) appears constant across both genders.
The above findings suggest that \(V_d\) seems to differ little across various characteristics of the individual. This suggests \(V_d\) is a more generalised approach compared to the \(\beta\) elimination rate.
It is also worth considering how the two rates are correlated, as independence is assumed between \(\beta\) and \(V_d\). Performing a correlation test between the two can help to deduce whether this assumption is upheld. Further, Figure 10 helps to visualise the relationship between \(\beta\) and \(V_d\).
##
## Pearson's product-moment correlation
##
## data: data$beta and data$Vd
## t = -4.3675, df = 98, p-value = 3.122e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5559863 -0.2250757
## sample estimates:
## cor
## -0.4036489
From the correlation test, we find that there is an observed correlation of -0.404, and the null hypothesis that the true correlation is equal to zero, i.e. the two measures are independent, can be confidently rejected as there is a corresponding p-value of \(3.122\times10^{-5}\).
Figure 10 further highlights this clear relationship between the two measures, and hence the independence assumption appears to be violated.
The current method offers several key strengths that make it well-suited for forensic applications. Its straightforward approach is both easy to implement and readily explained in court proceedings, which is essential for ensuring that evidence is understood by legal professionals and juries. By using the 97.5th percentiles, the method takes a conservative stance that provides defendants with the benefit of the doubt, aligning with fundamental principles of justice. The standardised nature of this approach ensures consistency across all cases, preventing arbitrary variations in how evidence is evaluated. Furthermore, the methodology’s transparency and reproducibility give it scientific rigour, allowing independent verification of results and supporting its credibility in legal contexts.
The current forensic method assumes independence between \(\beta\) and \(V_d\) when calculating the range for \(C_t\). However, a Pearson’s product-moment correlation analysis reveals a statistically significant negative correlation between these two parameters (correlation coefficient = -0.4036489, p-value = 3.122e-05 < 0.05). This finding directly contradicts the independence assumption underlying the current approach.
Although the official forensic procedure specifies using the 97.5th percentile for both \(\beta\) and \(V_d\), this convention is based on policy rather than mathematical reasoning. Forensic laboratories apply the same “upper percentile rule” across all parameters to maintain consistency and to err on the side of caution, as using higher parameter values for \(\beta\) and \(V_d\) generally produces lower blood alcohol estimates. Mathematically Widmark’s equation shows that \(C_t\) decreases as \(\beta\) and \(V_d\) both increase, thus the maximum possible value of \(C_t\) would occur for low \(\beta\) and low \(V_d\) values (their 2.5th percentiles). Regardless of which tail is used, the critical statistical issue with the current method is that it assumes that \(\beta\) and \(V_d\) are independent, when our analysis shows that \(\beta\) and \(V_d\) have a significant negative correlation.
Assuming independence allows forensic analysts to combine their most extreme values (e.g. highest \(\beta\) with highest \(V_d\)) even though such an extreme combination almost never occurs in practice. This may produce inaccurate \(C_t\) estimates, exaggerating how high a person’s blood alcohol concentration might have been at the time of interest or significantly underestimating. In statistical terms, the current approach treats the joint distribution of \((\beta, V_d)\) as the product of their marginal distributions, when in reality the joint distribution should account for their negative dependence structure.
The above plot of the joint distribution between \(\beta\) and \(V_d\) shows that there are no points in the top-right corner where both parameters are at their 97.5th marginal quantile simultaneously.
## 2.5% 50% 97.5%
## 0.658526 1.014769 1.455071
| Method | Lower_2.5 | Median | Upper_97.5 |
|---|---|---|---|
| Empirical joint (β,Vd) | 0.659 | 1.015 | 1.455 |
| Independent 97.5th percentiles | 0.497 |
The empirical joint \(C_t\) distribution represents the upper and lower bounds of plausible blood alcohol concentrations for an individual with weight 70kg, who consumed the mean grams of alcohol from the sample and was tested 2 hours later.
As seen from the table above this resulted in a range (0.659, 1.455), compared to \(C_t\) when assuming independence of \(\beta\) and \(V_d\) which gave a value of 0.497 which is outside the range of plausible values computed using the joint distribution. This highlights the idea that the current method may be too conservative, resulting in underestimated values of the actual blood alcohol levels of a person at the time of interest.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(afex))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(viridis))
suppressPackageStartupMessages(library(gridExtra))
suppressPackageStartupMessages(library(patchwork))
suppressPackageStartupMessages(library(kableExtra))
suppressPackageStartupMessages(library(cv))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(fitdistrplus))
suppressPackageStartupMessages(library(readxl))
suppressPackageStartupMessages(library(quantreg))
suppressPackageStartupMessages(library(purrr))
suppressPackageStartupMessages(library(broom))
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TASK 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
data <- read_excel("SCS_BAC_and_BrAC_split_TOP.xlsx")
data$sex <- as.factor(data$Sex)
data$beta60 <- data$`Beta60 (g/kg/h)`
data$weight <- data$`Weight (kg)`
data$height <- data$`Height (cm)`
data$age <- data$`Age (years)`
data$beta <- -data$beta60
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%% BETA EDA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# beta vs weight
weight_plot <- ggplot(data, aes(x=weight, y=beta, colour = sex)) +
geom_point(size = 3) +
geom_smooth(method="lm", color="navy") +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
labs(title="Figure 1: β vs Weight", x="Weight (kg)", y="β Elimination Rate (g/kg/h)")
# beta vs height
height_plot <- ggplot(data, aes(x=height, y=beta, colour = sex)) +
geom_point(size = 3) +
geom_smooth(method="lm", color="navy") +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
labs(title="Figure 2: β vs Height", x="Height (cm)", y="β Elimination Rate (g/kg/h)")
# beta vs age
age_plot <- ggplot(data, aes(x=age, y=beta, colour = sex)) +
geom_point(size = 3) +
geom_smooth(method="lm", color="navy") +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
labs(title="Figure 3: β vs Age", x="Age (years)", y="β Elimination Rate (g/kg/h)")
# beta vs gender
sex_plot <- ggplot(data, aes(x = sex, y = beta60,
fill = sex)) +
geom_violin(alpha = 0.8) +
geom_boxplot(width = 0.2, color = "navy", alpha = 0.7) +
scale_fill_manual(values = c("seagreen", "skyblue")) +
labs(
title = "Figure 4: β vs Gender",
x = "Gender",
y = "β Elimination Rate (g/kg/h)"
) +
guides(fill = "none")
# Compute correlation and p-value between beta and each characteristic.
corr_table <- data %>%
dplyr::select(beta, weight, age, height) %>%
pivot_longer(cols = c(weight, age, height), names_to = "Characteristic", values_to = "Value") %>%
group_by(Characteristic) %>%
summarise(
Correlation = cor(beta, Value, use = "pairwise.complete.obs"),
P_value = cor.test(beta, Value)$p.value
) %>%
mutate(
Correlation = round(Correlation, 3),
P_value = signif(P_value, 3)
)
#%%%%%%%%%%%%%%%%%%%% BETA DISTRIBUTION TESTING %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# Density plot of beta.
density_plot <- ggplot(data, aes(x = beta)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 0.005,
fill = "steelblue",
alpha = 0.55) +
geom_density(color = "navy",
size = 1) +
geom_vline(aes(xintercept=mean(beta)),
linetype = "dashed",
color = "red",
size = 2) +
geom_vline(aes(xintercept=quantile(beta, 0.025)),
linetype = "dotted",
color = "red",
size = 2) +
geom_vline(aes(xintercept=quantile(beta, 0.975)),
linetype = "dotted",
color = "red",
size = 2) +
labs(x = expression(""*beta), y = "Density",
title = expression("Distribution of "*beta))
# compute group statistics
stats <- data %>%
group_by(sex) %>%
summarize(
mean = mean(beta),
q025 = quantile(beta, 0.025),
q975 = quantile(beta, 0.975)
)
# helper: color mapping for male/female lines
stats <- stats %>%
mutate(line_colour = ifelse(sex == "female", "red", "blue"))
# custom legend labels for quantile lines
legend_lines <- data.frame(
sex = c("male", "female"),
color = c("blue", "red"),
linetype = c("dotted", "dotted"),
label = c("97.5th quantile (male)", "97.5th quantile (female)")
)
# main plot
sex_density_plot <- ggplot(data, aes(x = beta, fill = sex)) +
geom_histogram(aes(y = after_stat(density), color = sex),
binwidth = 0.005, alpha = 0.5, position = "identity") +
geom_density(aes(color = sex), size = 1, alpha = 0) +
# mean lines (dashed, red/blue)
geom_vline(data = stats,
aes(xintercept = mean, color = line_colour, linetype = "Mean"),
size = 2) +
# lower quantile (dotted)
geom_vline(data = stats,
aes(xintercept = q025, color = line_colour, linetype = "2.5th quantile"),
size = 2) +
# upper quantile (dotted)
geom_vline(data = stats,
aes(xintercept = q975, color = line_colour, linetype = "97.5th quantile"),
size = 2) +
# consistent manual color/fill mapping
scale_fill_manual(values = c("male" = "skyblue", "female" = "seagreen"),
name = "Sex") +
scale_color_manual(
name = "Quantile Colour",
values = c("male" = "skyblue", "female" = "seagreen",
"red" = "red", "blue" = "blue"),
breaks = c("blue", "red"),
labels = c("male quantiles (blue)", "female quantiles (red)")
) +
scale_linetype_manual(
name = "Statistic",
values = c("Mean" = "dashed", "2.5th quantile" = "dotted", "97.5th quantile" = "dotted")
) +
labs(
x = expression(beta),
y = "Density",
title = expression("Distribution of "*beta*" by Sex")
)
# Residual plots of fitted distributions
normal_fit <- fitdist(data$beta, distr = "norm", method = "mle")
beta_fit <- fitdist(data$beta, distr = "beta", method = "mle")
gamma_fit <- fitdist(data$beta, distr = "gamma", method = "mle")
# plot each fitted distribution on residual plot
plot.legend <- c("Normal", "Beta", "Gamma")
denscomp(list(normal_fit, beta_fit, gamma_fit), legendtext = plot.legend)
qqcomp(list(normal_fit, beta_fit, gamma_fit), legendtext = plot.legend)
cdfcomp(list(normal_fit, beta_fit, gamma_fit), legendtext = plot.legend)
ppcomp(list(normal_fit, beta_fit, gamma_fit), legendtext = plot.legend)
# modelling as gamma and finding 2.5% quantile
gamma_fit <- fitdist(-data$beta60, distr = "gamma", method = "mle")
summary(gamma_fit)
par(mar=c(1, 1, 1, 1))
plot(gamma_fit)
gamma_shape <- gamma_fit$estimate[[1]]
gamma_rate <- gamma_fit$estimate[[2]]
q025 <- -qgamma(0.975, gamma_shape, gamma_rate)
gamma_df <- data.frame(
x = data$beta,
density = dgamma(data$beta, shape = gamma_shape, rate = gamma_rate)
)
quantiles <- qgamma(c(0.025, 0.5, 0.975), gamma_shape, gamma_rate)
stats_gamma <- data.frame(
Quantile = rep(c("q025", "q50", "q975")),
value = quantiles
)
# plot gamma fit for beta.
gamma_plot <- ggplot(data, aes(x = beta)) +
geom_histogram(
aes(y = after_stat(density)),
binwidth = 0.005,
fill = "steelblue",
alpha = 0.5,
position = "identity") +
geom_line(
data = gamma_df,
aes(x = x,
y = density
),
color = "navy",
size = 1.2) +
geom_vline(
data = stats_gamma,
aes(xintercept = value,
linetype = Quantile),
color = "red",
size = 2
) +
scale_linetype_manual(
values = c("q025" = "dotted", "q50" = "dashed", "q975" = "dotted"),
labels = c("2.5%", "50%", "97.5%")
) +
labs(
x = expression(beta),
y = "Density",
title = expression("Gamma Distribution of "*beta),
linetype = "Quantile"
)
m_data <- data %>% filter(Sex == "male")
m_gamma_fit <- fitdist(m_data$beta, distr = "gamma", method = "mle")
m_gamma_shape <- m_gamma_fit$estimate[[1]]
m_gamma_rate <- m_gamma_fit$estimate[[2]]
m_quantiles <- qgamma(c(0.025, 0.5, 0.975), m_gamma_shape, m_gamma_rate)
f_data <- data %>% filter(Sex == "female")
f_gamma_fit <- fitdist(f_data$beta, distr = "gamma", method = "mle")
f_gamma_shape <- f_gamma_fit$estimate[[1]]
f_gamma_rate <- f_gamma_fit$estimate[[2]]
f_quantiles <- qgamma(c(0.025, 0.5, 0.975), f_gamma_shape, f_gamma_rate)
gamma_sex_df <- data.frame(
x = data$beta,
male_density = dgamma(data$beta, shape = m_gamma_shape, rate = m_gamma_rate),
female_density = dgamma(data$beta, shape = f_gamma_shape, rate = f_gamma_rate)
)
gamma_sex_long <- gamma_sex_df %>%
pivot_longer(cols = c(male_density, female_density),
names_to = "Sex",
values_to = "density") %>%
mutate(Sex = ifelse(Sex == "male_density", "male", "female"))
stats_gamma_sex <- data.frame(
Sex = rep(c("male", "female"), each = 3),
Quantile = rep(c("q025", "q50", "q975"), times = 2),
value = c(m_quantiles, f_quantiles)
)
# gamma fit of beta by sex
sex_gamma_plot <- ggplot(data, aes(x = beta)) +
# histogram with correct fill colors
geom_histogram(
aes(y = after_stat(density), fill = Sex),
binwidth = 0.005, alpha = 0.5, position = "identity"
) +
# gamma density curves (match fill colours)
geom_line(
data = gamma_sex_long,
aes(x = x, y = density, color = Sex, linetype = "Gamma density"),
size = 1.2
) +
# quantile lines (male = blue, female = red)
geom_vline(
data = stats_gamma_sex %>%
mutate(line_colour = ifelse(Sex == "female", "red", "blue")),
aes(xintercept = value, color = line_colour, linetype = Quantile),
size = 2
) +
# color scales (fill = distribution colour, line = quantile/gamma)
scale_fill_manual(
values = c("male" = "skyblue", "female" = "seagreen"),
name = "Sex"
) +
scale_color_manual(
name = "Quantile Colour",
values = c("male" = "skyblue", "female" = "seagreen",
"red" = "red", "blue" = "blue"),
breaks = c("skyblue", "seagreen", "blue", "red"),
labels = c("Male gamma curve", "Female gamma curve",
"Male quantiles (blue)", "Female quantiles (red)")
) +
# line type labels for quantiles
scale_linetype_manual(
name = "Statistic",
values = c("Gamma density" = "solid",
"q025" = "dotted",
"q50" = "dashed",
"q975" = "dotted"),
labels = c("Gamma density" = "Gamma density curve",
"q025" = "2.5th quantile",
"q50" = "50th quantile (median)",
"q975" = "97.5th quantile")
) +
labs(
x = expression(beta),
y = "Density",
title = expression("Gamma Distributions of "*beta*" by Sex")
)
#%%%%%%%%%%%%%%%%%%%%%%% LINEAR MODELLING BETA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# --- linear modelling beta 60 attempt ---
# model incorporating all characteristics
lm_model <- lm(beta ~ weight + age + height + sex, data = data)
# Residual plots
par(mfrow = c(2,2))
plot(lm_model)
# Try modelling square root of beta (residual plots indicate
# a slight quadratic mean-variance relationship)
sqrt_lm_model <- lm(sqrt(beta) ~ weight + age + height + sex, data = data)
# Residual plots
par(mfrow = c(2,2))
plot(sqrt_lm_model)
# Create predicted values
pred_data <- data
# quantiles for lm
resid_q025 <- quantile(residuals(lm_model), probs = 0.025)
pred_data$lm_pred <- predict(lm_model)
pred_data$lm_q025 <- pred_data$lm_pred + resid_q025
# using quantile regression instead
rq_model <- rq(beta ~ weight + age + height + sex, data = data, tau = 0.025)
pred_data$rq_pred <- predict(rq_model)
# Combine into a long format for ggplot
plot_data <- pred_data %>%
tidyr::pivot_longer(cols = c(lm_q025, rq_pred),
names_to = "model",
values_to = "predicted_beta")
# Plot
quantile_v_lm_plot <- ggplot(plot_data, aes(x = beta, y = predicted_beta, color = model)) +
geom_point( size = 3) +
geom_smooth(method = "lm", se = TRUE) +
labs(
title = expression("Comparing Models for the 2.5% Quantile for "*beta),
x = expression("Actual "*beta),
y = expression("Predicted "*beta),
color = "Model Type"
) +
scale_color_manual(values = c("lm_q025" = "orange", "rq_pred" = "darkred"),
labels = c("Linear Regression", "Quantile Regression"))
# compute group statistics
stats <- data %>%
group_by(Sex) %>%
summarize(
mean = mean(beta60),
q025 = quantile(beta60, 0.025),
q975 = quantile(beta60, 0.975)
)
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TASK 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# setup from question
test_person <- data.frame(weight = 70, height = 160, age = 70, sex = "female")
Ct <- 0.15
t <- 2
# empirical quantiles for Beta60 (population approach)
beta_pop <- quantile(data$beta, probs = c(0.025, 0.5, 0.975))
# C0 values using population quantiles
C0_pop <- Ct + beta_pop * t
# gamma fitted whole population distribution
gamma_quantiles <- qgamma(c(0.025, 0.5, 0.975), gamma_shape, gamma_rate)
C0_gamma <- Ct + gamma_quantiles * t
# gamma fitted to males
m_gamma_quantiles <- qgamma(c(0.025, 0.5, 0.975), m_gamma_shape, m_gamma_rate)
C0_gamma_m <- Ct + m_gamma_quantiles * t
# gamma fitted to females
f_gamma_quantiles <- qgamma(c(0.025, 0.5, 0.975), f_gamma_shape, f_gamma_rate)
C0_gamma_f <- Ct + f_gamma_quantiles * t
# predicted Beta using regression model
lm_pred_vals <- predict(lm_model, newdata = test_person, interval = "prediction", level = 0.95)
# C0 values from regression model
beta_pred_lm <- as.numeric(lm_pred_vals)
names(beta_pred_lm) <- c("Fitted", "Lower (PI 2.5%)", "Upper (PI 97.5%)")
C0_lm <- Ct + beta_pred_lm * t
# predicted beta using quantile regression model
rq_pred_vals <- predict(rq_model, newdata = test_person, interval = "confidence", level = 0.95)
C0_rq <- Ct + rq_pred_vals * t
# combine results into one comparison table
results_table <- tibble(
Approach = c("Population (empirical quantiles)", "Gamma Fitted", "Male Gamma Fitted", "Female Gamma Fitted", "Linear Regression", "Quantile Regression"),
Lower_2.5 = c(C0_pop[1], C0_gamma[1], C0_gamma_m[1], C0_gamma_f[1], C0_lm[2], C0_rq[2]), # note ordering: lwr=2nd col in pred
Central = c(C0_pop[2], C0_gamma[2], C0_gamma_m[2], C0_gamma_f[2], C0_lm[1], C0_rq[1]),
Upper_97.5 = c(C0_pop[3], C0_gamma[3], C0_gamma_m[3], C0_gamma_f[3], C0_lm[3], C0_rq[3])
)
# %%%%%%%%%%%%%%%%%%%%%%%%%% ALTERNATE RESULT APPROACHES %%%%%%%%%%%%%%%%%%%%
# Given parameters
Ct <- 0.15 # measured concentration (g/kg)
t <- 2 # time since arrest (hours)
x <- 0.47 # legal limit (g/kg)
# Fitted Gamma parameters for beta
gamma_shape <- 31.9538
gamma_rate <- 173.7079
# 1. Compute 2.5th and 97.5th percentiles of beta
beta_q025 <- qgamma(0.025, shape = gamma_shape, rate = gamma_rate)
beta_q975 <- qgamma(0.975, shape = gamma_shape, rate = gamma_rate)
# 2. Compute 95% interval for C0
C0_lower <- Ct + beta_q025 * t
C0_upper <- Ct + beta_q975 * t
C0_interval <- c(C0_lower, C0_upper)
# 3. Compute probability that true C0 exceeded legal limit
beta_threshold <- (x - Ct) / t
p_over_limit <- 1 - pgamma(beta_threshold, shape = gamma_shape, rate = gamma_rate)
# 4. Monte Carlo simulation
set.seed(123)
n_sim <- 1000
beta_samples <- rgamma(n_sim, shape = gamma_shape, rate = gamma_rate)
C0_samples <- Ct + beta_samples * t
p_over_limit_sim <- mean(C0_samples > x)
# Summaries
C0_mean <- mean(C0_samples)
C0_sd <- sd(C0_samples)
# Create summary table
results <- data.frame(
Statistic = c(
"Lower 95% bound for C₀",
"Upper 95% bound for C₀",
"Analytical P(C₀ > 0.47 g/kg)",
"Simulated P(C₀ > 0.47 g/kg)",
"Mean of simulated C₀",
"SD of simulated C₀"
),
Value = c(
round(C0_lower, 3),
round(C0_upper, 3),
round(p_over_limit, 3),
round(p_over_limit_sim, 3),
round(C0_mean, 3),
round(C0_sd, 3)
)
)
# Parameters
gamma_shape <- 31.9538
gamma_rate <- 173.7079
# Compute 2.5th and 97.5th percentiles of beta
beta_q025 <- qgamma(0.025, shape = gamma_shape, rate = gamma_rate)
beta_q975 <- qgamma(0.975, shape = gamma_shape, rate = gamma_rate)
# Current concentration
C_t <- 0.15
t <- 2
# Legal limit for C0
C_legal <- 0.47
# Derived beta that would reach legal limit
beta_limit <- (C_legal - C_t)/t
# Define range of beta
# beta <- seq(0, 0.5, length.out = 1000)
beta <- seq(beta_q025, beta_q975, length.out = 1000)
# Calculate C0
C0 <- C_t + beta * t
# Gamma PDF for beta
pdf_beta <- dgamma(beta, shape = gamma_shape, rate = gamma_rate)
# Plot C0 vs beta
plot(beta, C0, type = "l", lwd = 2, col = "blue",
xlab = expression(beta), ylab = expression(C[0]),
main = expression(paste(C[0], " vs ", beta)))
# Add vertical line at derived beta_limit
abline(v = beta_limit, col = "darkgreen", lwd = 2, lty = 2)
# Shading for beta values that make C0 > C_legal
beta_shade <- beta[beta > beta_limit]
C0_shade <- C0[beta > beta_limit]
polygon(c(beta_shade, rev(beta_shade)),
c(rep(min(C0), length(beta_shade)), rev(C0_shade)),
col = rgb(1, 0, 0, 0.3), border = NA)
# Add Gamma PDF on secondary y-axis
par(new = TRUE)
plot(beta, pdf_beta, type = "l", lwd = 2, col = "red",
axes = FALSE, xlab = "", ylab = "")
axis(side = 4, col = "red", col.axis = "red")
mtext("Density", side = 4, line = 3, col = "red")
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TASK 3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# define A, Co, Vd
data$A <- data$`Amount of Alcohol Consumed (g)`
data$Co <- data$`Co (g/Kg)`
data$Vd <- data$A/(data$Co *data$weight)
# view summary and quantiles of Vd
summary(data$Vd)
quantile(data$Vd, probs = c(0.025, 0.5, 0.975))
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Vd EDA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# histogram of Vd
density_plot <- ggplot(data, aes(x = Vd)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 0.005,
fill = "steelblue",
alpha = 0.55) +
geom_density(color = "navy",
size = 1) +
geom_vline(aes(xintercept=mean(Vd)),
linetype = "dashed",
color = "red",
size = 2) +
geom_vline(aes(xintercept=quantile(Vd, 0.025)),
linetype = "dotted",
color = "red",
size = 2) +
geom_vline(aes(xintercept=quantile(Vd, 0.975)),
linetype = "dotted",
color = "red",
size = 2) +
labs(x = "Vd (L/kg)", y = "Density", title = expression("Distribution of Vd"))
# Vd vs weight
weight_plot2 <- ggplot(data, aes(x = weight, y = Vd, colour = sex)) +
geom_point(size = 3) +
geom_smooth(method = "lm", colour = "navy") +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
labs(title = "Figure 6: Vd vs Weight", x = "Weight (kg)", y = "Vd (L/kg)")
# Vd vs height
height_plot2 <- ggplot(data, aes(x = height, y = Vd, colour = sex)) +
geom_point(size = 3) +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
geom_smooth(method = "lm", colour = "navy") +
labs(title = "Figure 7: Vd vs Height", x = "Height (cm)", y = "Vd (L/kg)")
# Vd vs age
age_plot2 <- ggplot(data, aes(x = age, y = Vd, colour = sex)) +
geom_point(size = 3) +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
geom_smooth(method = "lm", colour = "navy") +
labs(title = "Figure 8: Vd vs Age", x = "Age (years)", y = "Vd (L/kg)")
# Vd vs gender
sex_plot2 <- ggplot(data, aes(x = sex, y = Vd,
fill = sex)) +
geom_violin(alpha = 0.8) +
geom_boxplot(width = 0.2, color = "navy", alpha = 0.7) +
scale_fill_manual(values = c("seagreen", "skyblue")) +
labs(
title = "Figure 9: Vd vs Gender",
x = "Gender",
y = "Vd (L/kg)"
) +
guides(fill = "none")
# Test correlation to investigate independent assumption
cor.test(data$beta, data$Vd, use = "complete.obs")
# View quantiles of each coefficient for comparison
beta_range <- quantile(data$beta, probs = c(0.025, 0.975))
Vd_range <- quantile(data$Vd, probs = c(0.025, 0.975))
# Scatter plot between beta and V_d
ggplot(data, aes(x = beta, y = Vd, colour = sex)) +
geom_point() +
scale_colour_manual(values = c("male" = "steelblue", "female" = "lightblue")) +
geom_smooth(method = "lm", color = "navy") +
labs(
title = "Figure 10: Relationship between β and Vd",
subtitle = paste("Correlation =", round(cor(data$beta, data$Vd, use = "complete.obs"), 3)),
x = "β elimination rate (g/kg/h)",
y = "Vd (L/kg)"
)
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ALT APPROACH %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# Visualize the joint distribution
ggplot(data, aes(x = beta, y = Vd)) +
geom_point(alpha = 0.6, color = "orange", size = 3) +
geom_density_2d(color = "red") +
# Add the marginal 97.5th percentiles
geom_vline(xintercept = quantile(data$beta, 0.975, na.rm = TRUE),
linetype = "dashed", color = "blue", linewidth = 1) +
geom_hline(yintercept = quantile(data$Vd, 0.975, na.rm = TRUE),
linetype = "dashed", color = "blue", linewidth = 1) +
labs(
title = "Joint Distribution of β and Vd",
subtitle = "Blue lines show marginal 97.5th percentiles",
x = "β (g/kg/h)",
y = "Vd (L/kg)"
) +
annotate("text",
x = quantile(data$beta, 0.975, na.rm = TRUE),
y = min(data$Vd, na.rm = TRUE),
label = "97.5th percentile β", size = 7,
hjust = -0.1, color = "blue")
# Test person (from task 2)
A <- mean(data$A) # Chose this kinda randomly but made the most sense to me
weight <- 70
t <- 2
# Calculate Ct using empirical joint distribution
data$Ct_joint <- (A / (weight * data$Vd)) - data$beta * t
# Compute the quantiles of C_t
quantile(data$Ct_joint, probs = c(0.025, 0.5, 0.975), na.rm = TRUE)
# Current (independent) approach
beta_ind <- quantile(data$beta, 0.975, na.rm = TRUE)
Vd_ind <- quantile(data$Vd, 0.975, na.rm = TRUE)
Ct_independent <- (A / (weight * Vd_ind)) - beta_ind * t
# Table comparison
results_compare <- tibble(
Method = c("Empirical joint (β,Vd)", "Independent 97.5th percentiles"),
Lower_2.5 = c(round(quantile(data$Ct_joint, 0.025, na.rm = TRUE), 3), ""),
Median = c(round(quantile(data$Ct_joint, 0.5, na.rm = TRUE), 3), ""),
Upper_97.5 = c(round(quantile(data$Ct_joint, 0.975, na.rm = TRUE), 3),
round(Ct_independent, 3))
)